This document is a short analysis of the relationship between price apartments and ground living area. It is associated with this publication. This work is hosted in in this github repository.

1 Libraries


See the code for a list of necessary libraries.

library(tidyverse)
library(rmarkdown)    # You need this library to run this template.
library(epuRate)      # Install with devtools: install_github("holtzy/epuRate", force=TRUE)
library(plotly)       # Turn your ggplot2 interactive
library(hrbrthemes)   # For good looking plots
library(DT)           # To show tables

2 Data preprocessing


Data often need to be prepared. Here I just select 100 random samples.

# Load dataset from github
data <- read.table("DATA/data.csv", header=T, sep=",") %>% dplyr::select(GrLivArea, SalePrice)

# Keep a few lines
data <- data %>% sample_n(100)

3 Analysis


Here is a description of the relationship between prices and ground living area:

# Plot
p <- data %>% 
  mutate(text=paste("Apartment Number: ", seq(1:nrow(data)), "\nLocation: New York\nAny info you need..", sep="")) %>%
  ggplot( aes(x=GrLivArea, y=SalePrice/1000, text=text)) +
    geom_point(color="#69b3a2", alpha=0.8) +
    ggtitle("Ground living area partially explains sale price of apartments") +
    theme_ipsum() +
    theme(
      plot.title = element_text(size=12)
    ) +
    ylab('Sale price (k$)') +
    xlab('Ground living area')

# Turn it interactive
ggplotly(p, tooltip="text")

4 Show raw data


In case your interested about a particular data point, here is the complete input dataset:

 

A work by Yan Holtz

yan.holtz.data@gmail.com